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Abstract: In cloud computing schemes, Infrastructure as a Service represents a cloud- 
computing technology that delivers computing resources, networking, and storage to 
consumers on-demand, over the internet. It enables end customer or end users to upscale or 
downsize resources on an as-when needed basis, reducing the need for upscaling, up-front 
capital expenditures or unnecessary infrastructure. In this paper, we propose an efficient 
additive encryption scheme based on Shamir's secret sharing for securing data warehouses in 
the cloud that addresses the shortcomings of existing approaches by reducing overhead while 


still enforcing good data privacy. 
1. Introduction 


The continued advancement of information technology and data communications 
strengthens the exchange of highly sensitive medical information. electronic health systems 
are widely used, and many medical facilities rely on the transmission and receipt of medical 
information online and on local networks. Over the years, many security systems have been 
introduced sqto monitor patient privacy and ensure the safety of interchangeable medical data. 


Cryptography is one of the techniques that often provides security for eHealth systems [1-5]. 


Nowadays, data outsourcing scenarios tremendously grow with the advent of cloud 
computing that offers both cost savings and service benefits. One of the most noteworthy 
cloud outsourcing services is Database-as-a-Service, where individuals and organizations 
outsource data storage and management to a Cloud Service Provider (CSP) [6-17]. Naturally, 
such services allow outsourcing a DW and running OLAP queries. Yet, data outsourcing 
brings out privacy concerns since sensitive data are stored, maintained and processed by an 


external thirdparty that may not be fully trusted. 


A typical solution to preserve data privacy is encrypting data locally be-fore sending 
them to an external server. Secure database management systems (SDBMSs) such as 
CryptDB implement cryptographic schemes [18-29]. Paillier's partially homomorphic 


encryption scheme is notably used in CryptDB to provide high security. However, it induces a 
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high storage and computation over-head. We propose a new Secure Secret Splitting Scheme 
(S4) that aims at replacing Paillier's scheme in systems such as CryptDB. S4is based on the 
idea of secret sharing and is efficient both in terms of storage and computing, without 


sacrificing privacy too much. 


CryptDB brings together powerful cryptographic tools to handle query processing on 
encrypted data without decryption. Encryption in CryptDB is like onion layers that store 
multiple ciphertexts, i.e., encrypted data, within each other. Each onion layer enables certain 
kind of query processing and a given security level provided by one encryption scheme [30- 
43]. For instance, order-preserving encryption (OPE) enables range queries and additive 
hemimorphic encryption enables addition over encrypted data. Yet, CryptDB is not perfectly 
secured since schemes such as OPE reveal some statistical information about plaintext. 
MONOMI builds upon CryptDB to allow the execution of analytical work-loads over 
encrypted data outsourced to the cloud. MONOMI aims at im-proving CryptDB's query 
processing capability and efficiency based on split client/server execution. A designer also 
optimizes physical data layout [44-61]. Eventually, using a local trusted hardware at the 
CSP's, such as TrustedDB and CipherBase, is an alternative approach to query encrypted data. 
How-ever, trusted hardware is limited in computation ability and memory capacity, and also 


very expensive. 


Fully homomorphic encryption (FHE) allows performing arbitrary arithmetic 
operations over encrypted data withoutdecryption. FHE provides semantic security, 1.e., it is 
computationally impossible to distinguish two cipher texts encrypted from the same 
plaintext. However, FHE requires so much computing power that it cannot be used in 
practice. Partially homomorphic encryption (PHE) is more efficient than FHE. Paillier's 
the most efficient additive FHE. With Paillier's scheme, multiplying the encryption of 
twovalues results in an encryption of the sum of the values, i.e.,Enc,(x) Encx(y) = Enc,(x + 
y), where the multiplication is performed modulo some public-key k. Paillier's scheme is, 
however, still computation-ally intensive and induces as large ciphertext sizes as 2048 bits. 
Additionally, modularmultiplications become computationally expensive on a largenumber of 


records, such as in the fact counter of a DW. 


Secret sharing divides a secret piece of data into so-called shares that are stored at n 
participants’. A subset of k nparticipants is required to reconstruct the secret. In Shamir's, the 
rst secret S4's driving idea is based on secret sharing, but instead of sharing secrets to n 


participants’ or CSPs', they are stored at one single CSP's. Thus, we avoid the high storage 
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overhead of secret sharing. In S4, each secret vj is divided into n = k splits vj.j; 21:5 Vkj. k 1 
splits, v1.j; 23:5 Vij, are stored at the CSP's and v,,; is stored in a trusted machine, e.g., at the 


user's. In order to reduce storage overhead at the user's, v,.; 1s set to be the same for all secrets. 
2. Methodology 


First, x; and vy, are randomly set up from F,, where p is a big prime number, i.e., 
greater than the greatest possible query answer. For any secret v;, a random polynomial P,; (x) 
is builtthat passes through (0; vj) and (x; vx). To this end, k--2 points (aj; bj); 1 = 1; :::; k 2 are 
chosen randomly from F,such that a; 6= x, and aj 6= 0 81 = 1; :::; k 2. Given k points (a1; bi); 
(a2; bz); 22:5 (ax23 Baz), (0; vj) and (Xk; Vk), polynomial Py; (x) is built. Storing the k 2 ran-dom 


points is unnecessary because they are not needed for secret reconstruction.To divide v; into k 


chosen from F, such that x; 6= Oand x; 6= x, 81 = 1; :::; k 1. Then, splits are vj; = Pyj (xi). 
K=(X; (Xx; Vx)) is considered as a private key for S4 and must be kept hidden from the CSP. 
Tore construct secret v;, its k 1 splits must be retrieved from the CSP. Given points (xj; vj,), 1 


= 15 :::;k 1 and (x; vx), which is stored at the user's, polynomial P,; (x) can be reconstructed. 


Let a relational T consist of one attribute A (additional attributes, if any, can be 
processed similarly). Suppose T has m records. We denote by vj the i value of A. For 
attribute A in T , k 1 attributes Aj, 1 = 1; :::; k 1 are created in T ° atthe CSP's, where each 
attribute A; stores the i" splits. Without loss of generality, we assume integer data type for 
other data types can be transformed into integers before splitting. S4 allows summation 


queries to be computed directly at the CSP's. Consider a query that sums q values of A. 


Paillier's PHE is semantically secure, but it is too expensive in terms of cipher-text 
storage space and query response time. S4 proposes a classical trade-o with a lower level of 
security, but better storage and response time efficiency. Let us consider a scenario where the 
CSP is said honest but curious, which is a widely used adversary model for cloud data 
outsourcing. Such a CSP faithfully complies to any service-level agreement and, in our 
particular case, stores data, runs queries and provides results without alteration, malicious or 


otherwise. Yet, the CSP may access data and infer information from queries and results. 


Privacy in S4 relies on the fact that a secret value is only retrievable by the user via 
private key K. As in secret sharing, it is indeed guaranteed that at least k splits and X are 
necessary to reconstruct a secret, while the CSP has access to only k 1 splits. Both X and the 


ko split, i.e., K, are stored at the user's. However, the CSP still has access to linear 
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combinations of splits, which provide some information. Still, the higher k is, the more di 
cult it is to interpret linear combinations of splits. Thus, k is the prime security parameter in 


S4. Experiments provide hints for choosing k. 


Moreover, if some secrets are known by the CSP, e.g., through public communication 
of a company to its shareholders. For example, if the CSP knows secrets vj; :::; Vi. Also 
knowing the correspond-ing splits v1.j; 23:3 Viaj 8) 2 (1; k 1), the CSP can recover the Lagrange 
basis polynomials *;(O) 81 2 (1; k) and recover all secrets. However, theCSP must know at 
least k 1 secrets to do so. Moreover, we also propose leads to address this problem in next 


segment. 
3. Results 


We implement S4 in C using compiler gcc 4.8.2. S4's source code is freely available on- 
line. Experiments related to Paillier's PHE exploit the libpaillier standard C library. All 
mathematical computations use the GNU Multiple Precision Arithmetic Library (GMP). 
Eventually, we conduct our experiments on an Intel Core 17 3.10 GHz PC with 16 GB of 
RAM running Linux Ubuntu 15.05. We compare S4 and Paillier's PHE using simple synthetic 
datasets, 1.e., 32-bit unsigned integers generated uniformly at random from the integer range 
(103; 104). We scale up the number of records m such that m 2 (103, 104, 105, 106), forming 
four distinct datasets. In $4, we vary k from 8 to 64, higher values of k inducing too long 
execution Pm times. Prime p must be greater than the greatest query answer, e.g., p > j=1 vj. 
In Paillier's PHE, we use a key size of 1024 bits, which induces ciphertexts of 2048 bits. 


Such key size is the absolute minimum to achieve security. 


It is seen that encryption time in S4 is lower than Paillier's when k 16, and then becomes 
higher when k 16. Secret splitting consists in building a random polynomial by randomly 
choosing k 2 points. Hence, splitting time increases with k. This actually illustrates the tradeo 
between S4's security and encryption efficiency with respect to Paillier's PHE. With the 
selected values of k, decryption is faster with S4 than with Paillier's PHE. This is mainly 
because Paillier's scheme needs m expensive modular multiplications of large, 2048-bit 
numbers for decryption, while secret reconstruction in $4 works by polynomial interpolation 


over k points andevaluating the polynomial in one single point. 


With the selected values of k, S4's storage overhead is always much smaller than that of 
Paillier's PHE since y axis follows a logarithmic scale. Paillier's scheme indeed produces 


2048-bit cipher texts. Thus, its storage overhead is m 2048. With S4, each value is split into k 
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1 values. Thus, S4's storage overhead is m (k 1) times plaintext size. It is seen that, with the 
selected values of k, query execution time in S4 is lower than that of Paillier's scheme. This is 
because Paillier's scheme requires mexpensive modular multiplications to compute a sum, 


while S4 computes only (k 1) m simple modular additions. 
4. Conclusions 


We achieved performance gains through a slight degradation of se-curity, especially 
when an adversary has knowledge of secret values. Although it is definitely satisfactory in 
some cloud DW and OLAP scenarios, e.g., public aggregate data might not actually yield 
secrets, i.e., ne-grained data, we will devote future research to strengthen S4 against such 
threats. More precisely, we plan to introduce noise, as in many cryptographic problems such 


as approximate-GCD or LWE. 
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